An Algorithm for Suffix Sorting and Its Applications∗

نویسندگان

  • Fei Nan
  • Don Adjeroh
چکیده

The suffix tree is a data structure that has found applications in various important problems, such as genetic sequencing, pattern matching and computational biology. Its derivative data structure, the suffix array, is another representation with the added advantage of a small memory footprint. We propose a simple O(n log n) time divideand-conquer sort-and-merge algorithm for solving the suffix sorting problem. Given the suffix array, the array of Longest Common Prefix (LCP) can be constructed in O(n) time. Our proposed algorithm distinguishes itself from existing suffix array algorithms by the use of a relatively simple partitioning scheme at the division stage. We discuss applications suffix sorting to different problems in computational biology.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Suffix Sorting

We present a parallel algorithm for lexicographically sorting the suffixes of a string. Suffix sorting has applications in string processing, data compression and computational biology. The ordered list of suffixes of a string stored in an array is known as Suffix Array, an important data structure in string processing and computational biology. Our focus is on deriving a practical implementati...

متن کامل

Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

We present a linear-time algorithm to compute the longest common prefix information in suffix arrays. As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information.

متن کامل

Exposition and Analysis of a Suffix Sorting Algorithm

This paper focuses on the suffix sorting algorithm of Maniscalco [10], which at the time of writing is available only as C++ source code on the Internet. We will refer to the program as MSufSort. MSufSort computes the Inverse Suffix Array (ISA) of an input string, which is equivalent to computing the Suffix Array (converting one to the other is discussed in section 8). Recall that for i ∈ [0..n...

متن کامل

Direct Suffix Sorting and Its Applications

Direct Suffix Sorting and Its Applications

متن کامل

In-Place Suffix Sorting

Given string T = T [1, . . . , n], the suffix sorting problem is to lexicographically sort the suffixes T [i, . . . , n] for all i. This problem is central to the construction of suffix arrays and trees with many applications in string processing, computational biology and compression. A bottleneck in these applications is the amount of workspace needed to perform suffix sorting beyond the spac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006